Selecting the Appropriate Ensemble Learning Approach for Balanced Bioinformatics Data

نویسندگان

  • David J. Dittman
  • Taghi M. Khoshgoftaar
  • Amri Napolitano
چکیده

Ensemble learning (process of combining multiple models into a single decision) is an effective tool for improving the classification performance of inductive models. While ideal for domains like bioinformatics with many challenging datasets, many ensemble methods, such as Bagging and Boosting, do not take into account the high-dimensionality (large number of features per instance) that is commonly found in bioinformatics datasets. This work seeks to observe the effects of two relatively new ensemble learning methods (Select-Bagging and Select-Boosting: the Bagging and Boosting approaches with feature selection implemented within each iteration of their algorithms) on a series of seven balanced (greater than a 43.50% minority class distribution) bioinformatics datasets. Additionally, we included the results when no ensemble approach is implemented (denoted as No-Ensemble) so that we can observe the full effects of ensemble learning. In order to test the three approaches we use three feature rankers, four feature subset sizes, and two classifiers. The results show that Select-Bagging is the top performing ensemble approach and statistical analysis confirms that Select-Bagging is significantly better than No-Ensemble and better (though not significantly) than Select-Boosting. Our recommendation is that SelectBagging is an excellent choice for improving classification performance for bioinformatics datasets. To our knowledge, this work is the first empirical study focused exclusively on balanced bioinformatics datasets that investigated the effects of ensemble learning and utilizes Select-Bagging and introduces Select-Boosting.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Ensemble Classification and Extended Feature Selection for Credit Card Fraud Detection

Due to the rise of technology, the possibility of fraud in different areas such as banking has been increased. Credit card fraud is a crucial problem in banking and its danger is over increasing. This paper proposes an advanced data mining method, considering both feature selection and decision cost for accuracy enhancement of credit card fraud detection. After selecting the best and most effec...

متن کامل

A Novel Ensemble Approach for Anomaly Detection in Wireless Sensor Networks Using Time-overlapped Sliding Windows

One of the most important issues concerning the sensor data in the Wireless Sensor Networks (WSNs) is the unexpected data which are acquired from the sensors. Today, there are numerous approaches for detecting anomalies in the WSNs, most of which are based on machine learning methods. In this research, we present a heuristic method based on the concept of “ensemble of classifiers” of data minin...

متن کامل

Wised Semi-Supervised Cluster Ensemble Selection: A New Framework for Selecting and Combing Multiple Partitions Based on Prior knowledge

The Wisdom of Crowds, an innovative theory described in social science, claims that the aggregate decisions made by a group will often be better than those of its individual members if the four fundamental criteria of this theory are satisfied. This theory used for in clustering problems. Previous researches showed that this theory can significantly increase the stability and performance of...

متن کامل

Wised Semi-Supervised Cluster Ensemble Selection: A New Framework for Selecting and Combing Multiple Partitions Based on Prior knowledge

The Wisdom of Crowds, an innovative theory described in social science, claims that the aggregate decisions made by a group will often be better than those of its individual members if the four fundamental criteria of this theory are satisfied. This theory used for in clustering problems. Previous researches showed that this theory can significantly increase the stability and performance of...

متن کامل

Ensemble Gene Selection Versus Single Gene Selection: Which Is Better?

One of the major challenges in bioinformatics is selecting the appropriate genes for a given problem, and moreover, choosing the best gene selection technique for this task. Many such techniques have been developed, each with its own characteristics and complexities. Recently, some works have addressed this by introducing ensemble gene selection, which is the process of performing multiple runs...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015